DMTCP: Bringing Checkpoint-Restart to Python
نویسندگان
چکیده
DMTCP (Distributed MultiThreaded CheckPointing) is a mature checkpoint-restart package. It operates in user-space without kernel privilege, and adapts to application-specific requirements through plugins. While DMTCP has been able to checkpoint Python and IPython "from the outside" for many years, a Python module has recently been created to support DMTCP. IPython support is included through a new DMTCP plugin. A checkpoint can be requested interactively within a Python session, or under the control of a specific Python program. Further, the Python program can execute specific Python code prior to checkpoint, upon resuming (within the original process), and upon restarting (from a checkpoint image). Applications of DMTCP are demonstrated for: (i) Python-based graphics using VNC; (ii) a Fast/Slow technique to use multiple hosts or cores to check one Cython computation in parallel; and (iii) a reversible debugger, FReD, with a novel reverse-expression watchpoint feature for locating the cause of a bug.
منابع مشابه
A Generic Checkpoint-Restart Mechanism for Virtual Machines
It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machine, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual ma...
متن کاملDMTCP: Scalable User-Level Transparent Checkpointing for Cluster Computations
As the size of clusters increases, failures are becoming increasingly frequent. Applications must become fault tolerant if they are to run for extended periods of time. We present DMTCP (Distributed MultiThreaded CheckPointing), the first user-level distributed checkpointing package not dependent on a specific message passing library. This contrasts with existing approaches either specific to l...
متن کاملTemporal Debugging using URDB
A new style of temporal debugging is proposed. The new URDB debugger can employ such techniques as temporal search for finding an underlying fault that is causing a bug. This improves on the standard iterative debugging style, which iteratively re-executes a program under debugger control in the search for the underlying fault. URDB acts as a meta-debugger, with current support for four widely ...
متن کاملAdapting the DMTCP Plugin Model for Checkpointing of Hardware Emulation
Checkpoint-restart is now a mature technology. It allows a user to save and later restore the state of a running process. The new plugin model for the upcoming version 3.0 of DMTCP (Distributed MultiThreaded Checkpointing) is described here. This plugin model allows a target application to disconnect from the hardware emulator at checkpoint time and then reconnect to a possibly different hardwa...
متن کاملTemporal Debugging: Automating Time Travel Debugging with URDB
This work addresses two classical problems in debugging. First, while some excellent reversible debuggers have been built for C, C++, Java, Standard ML, other languages including MATLAB, Python and Perl lack such reversible debuggers. To solve this, this work contributes a new temporal debugging approach and a new software package, URDB (Universal Reversible DeBugger), which can extend the nati...
متن کامل